Theory behind bitop contrast operation

Contrast streaching creates negative values that are
hard to store in bit-packed colors.

(c-m)*k+b
c*k+b-m*k
c*k+b-b*k
c*2-16
c*4-16*3

The subtraction of m*k is a constant. Negative values are 
avoided if c*k+b is always greater than or equal to m*k.

When m*k is a power of two this becomes practical. For example,
fixing m to the 5-bit mean of 16 and k to 2, m*k is 32. If we
can test which colors in c*k+b are less than 32 and set them
to 32, then we can use subtraction.

c     00 000r rrrr 0000 0ggg gg00 000b bbbb
c*2   00 00rr rrr0 0000 gggg g000 00bb bbb0
c*2+b 00 0rrr rrrr 000g gggg gg00 0bbb bbbb
32    00 0100 0000 0001 0000 0000 0100 0000
32    0x4010040

colors greater than 32 will have the 32nd bit set to 1
to clip before subtraction we should set all colors less than
32 to 32. We can create a test mask to grab the 32 bits

c & 0x4010040

Colors that are too small will be missing this mask. Shift it
right 6 bits and multiply by 0b111111 0x3f to make a mask. 

000100000000010000000001000000
000000000100000000010000000001
000011111100001111110000111111

And this mask with the color to delete numbers less than 32.
Or shift the color right 6 bits and use a shifted mask. The
32 bit will get zeroed out here, same as subtracting 32. So,
you have your result.

c*=2;
c+=b;
c&=(c>>6&0x100401)*0x3f;

We have dealt with subtraction, and avoiding negative values,
but some colors will still be too large. In this example,
only one bit of overflow is possible. To detect overflow
shift right by 5 and use our mask again

c*=2;
c+=b;
c&=(c>>6&0x100401)*0x3f;
c |= (c>>5 & 0x100401) * 0x1f;
c &= W15M;

This can be generalized by adding some value so that the
amount we then need to subtract is a power of two. A general 
solution suitable for homeostatic contrast and brightness.

(c-m)*k+b
c*k-m*k+b
c*k+b-m*k
(c*k*n+b*n+n/2-m*k*n) / n

a = c*k*n+b*n+n/2
b = m*k*n
a = n/2
a = 31*2*n+31*n+n/2
b = 2^(6+n)

There are five overflow bits available. The last bit is for
subtraction. The second to last bit is for over-the-top flow.
That leaves three bits. The multiplication can be rounded or
rounding can be delayed. I suggest preserving two bits from
the multiplication, temporarily using 7 bit color. Since
contrast may be as large as 2, this is actually 8 bit color.

c = c*contrast + (I15M<<2) >> 3 & I15M*0x7f;

Now we have 7 bit color with 3 bits available for overflow.
We need to adjust m and b to also be 7-bit colors

b = b<<2;
m = m*contrast + (I15M<<2) >> 3 & I15M*0x7f;

add in the brightness.

c = c+b;

what is the smallest power of two guaranteed to be no less
than m? The largest value m can take on is:

(31*2**6 + 2**2)/2**3

This is 248. So we will end up adding a factor that might be
as large as 256, then subtracting 256. The largest value 
before subtraction is then 628, which takes 10 bits to store.
This is perfect. 4*13 + 248+256 = 628

c = (c*contrast + (I15M<<2) >> 3 & I15M*0x7f) + (b<<2) + (I15M*0x100-m);

Now we have a 10 bit number, and we want to subtract 256 from
it, but we don't want to go negative. So, if the number is 
less than 256=0x100, we set it to 256. We have 10 bit color,
which means there are two bits over 256. If either of these is
1, then our number is larger than 256. If neither is 1, our
nuber is too small. 

mask = c>>8&I15M|c>>9&I15M;
c = (c&mask*0x3ff)|(I15M*256&(mask^I15M)*0x3FF);
c = c-256*I15M;

Now if all went well we should have a 10 bit number with
the bottom subtracted with a zero floor. We still need to 
clip it down to the correct value. It's supposed to be a 7-bit
color that may go over, so by now it should be down to 8 bits.
except because of brightness adjustment it may actually be 9
bits. So! if it is too big we should clip it

c = (c | (c>>7&I15M|c>>8&I15M)*0x7f) & W15M;

The total program then is: 

m = m*contrast + (I15M<<2) >> 3 & I15M*0x7f;
c = (c*contrast + (I15M<<2) >> 3 & I15M*0x7f) + (b<<2) + (I15M*0x100-m);
int mask = c>>8&I15M|c>>9&I15M;
c = (c&mask*0x3ff)|(I15M*256&(mask^I15M)*0x3FF);
c = c-256*I15M;
c = (c | (c>>7&I15M|c>>8&I15M)*0x7f) & W15M;